Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chengyu Tao

Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection

May 07, 2026

Letian Bai, Xuanming Cao, Juan Du, Chengyu Tao

Abstract:Zero-shot 3D anomaly detection aims to identify anomalies without access to training data from target categories. However, existing methods mainly rely on projecting 3D observations into multi-view representations that primarily capture geometric cues rather than realistic visual semantics and process them with vision encoders pretrained on RGB data, leading to a significant domain gap between the encoder and the projected representations. To address this issue, we propose Align3D-AD, a unified two-stage framework that leverages the RGB modality from auxiliary categories as cross-modal guidance for zero-shot 3D anomaly detection. First, we introduce a cross-modal feature alignment paradigm that maps rendering features into the RGB semantic space. Unlike prior works that implicitly rely on pretrained encoders, our method enables direct semantic transfer from RGB observations. A semantic consistency reweighting strategy is further introduced to refine feature alignment by reweighting local regions according to holistic semantic consistency. Second, we propose a modality-aware prompt learning framework with dual-prompt contrastive alignment. By assigning independent prompts to RGB-aligned and rendering features, our method captures complementary semantics across modalities, while the contrastive alignment further enhances prompt representations to improve discriminability. Extensive experiments on MVTec3D-AD, Eyecandies, and Real3D-AD demonstrate that Align3D-AD consistently outperforms existing zero-shot methods under both one-vs-rest and cross-dataset settings, highlighting its generalization capability and robustness. Code and the dataset will be made available once our paper is accepted.

Via

Access Paper or Ask Questions

FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

Apr 08, 2026

Xiangru Jian, Hao Xu, Wei Pang, Xinjian Zhao, Chengyu Tao, Qixin Zhang, Xikun Zhang, Chao Zhang, Guanzhi Deng, Alex Xue(+6 more)

Abstract:The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world manufacturing environments. Progress is hindered by data scarcity and a lack of fine-grained domain semantics in existing datasets. To bridge this gap, we introduce FORGE. Wefirst construct a high-quality multimodal dataset that combines real-world 2D images and 3D point clouds, annotated with fine-grained domain semantics (e.g., exact model numbers). We then evaluate 18 state-of-the-art MLLMs across three manufacturing tasks, namely workpiece verification, structural surface inspection, and assembly verification, revealing significant performance gaps. Counter to conventional understanding, the bottleneck analysis shows that visual grounding is not the primary limiting factor. Instead, insufficient domain-specific knowledge is the key bottleneck, setting a clear direction for future research. Beyond evaluation, we show that our structured annotations can serve as an actionable training resource: supervised fine-tuning of a compact 3B-parameter model on our data yields up to 90.8% relative improvement in accuracy on held-out manufacturing scenarios, providing preliminary evidence for a practical pathway toward domain-adapted manufacturing MLLMs. The code and datasets are available at https://ai4manufacturing.github.io/forge-web.

* Project Page:https://ai4manufacturing.github.io/forge-web

Via

Access Paper or Ask Questions

SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

Apr 07, 2026

Letian Bai, Chengyu Tao, Juan Du

Abstract:Multi-view anomaly detection aims to identify surface defects on complex objects using observations captured from multiple viewpoints. However, existing unsupervised methods often suffer from feature inconsistency arising from viewpoint variations and modality discrepancies. To address these challenges, we propose a Semantic and Geometric Alignment Network (SGANet), a unified framework for multimodal multi-view anomaly detection that effectively combines semantic and geometric alignment to learn physically coherent feature representations across viewpoints and modalities. SGANet consists of three key components. The Selective Cross-view Feature Refinement Module (SCFRM) selectively aggregates informative patch features from adjacent views to enhance cross-view feature interaction. The Semantic-Structural Patch Alignment (SSPA) enforces semantic alignment across modalities while maintaining structural consistency under viewpoint transformations. The Multi-View Geometric Alignment (MVGA) further aligns geometrically corresponding patches across viewpoints. By jointly modeling feature interaction, semantic and structural consistency, and global geometric correspondence, SGANet effectively enhances anomaly detection performance in multimodal multi-view settings. Extensive experiments on the SiM3D and Eyecandies datasets demonstrate that SGANet achieves state-of-the-art performance in both anomaly detection and localization, validating its effectiveness in realistic industrial scenarios.

Via

Access Paper or Ask Questions

G$^{2}$SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection

Mar 13, 2025

Chengyu Tao, Xuanming Cao, Juan Du

$Figure 1 for G$^{2}$SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection$

$Figure 2 for G$^{2}$SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection$

$Figure 3 for G$^{2}$SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection$

$Figure 4 for G$^{2}$SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection$

Abstract:Industrial quality inspection plays a critical role in modern manufacturing by identifying defective products during production. While single-modality approaches using either 3D point clouds or 2D RGB images suffer from information incompleteness, multimodal anomaly detection offers promise through the complementary fusion of crossmodal data. However, existing methods face challenges in effectively integrating unimodal results and improving discriminative power. To address these limitations, we first reinterpret memory bank-based anomaly scores in single modalities as isotropic Euclidean distances in local feature spaces. Dynamically evolving from Eulidean metrics, we propose a novel \underline{G}eometry-\underline{G}uided \underline{S}core \underline{F}usion (G$^{2}$SF) framework that progressively learns an anisotropic local distance metric as a unified score for the fusion task. Through a geometric encoding operator, a novel Local Scale Prediction Network (LSPN) is proposed to predict direction-aware scaling factors that characterize first-order local feature distributions, thereby enhancing discrimination between normal and anomalous patterns. Additionally, we develop specialized loss functions and score aggregation strategy from geometric priors to ensure both metric generalization and efficacy. Comprehensive evaluations on the MVTec-3D AD dataset demonstrate the state-of-the-art detection performance of our method with low positive rate and better recall, which is essential in industrial application, and detailed ablation analysis validates each component's contribution.

Via

Access Paper or Ask Questions

Deep Subspace Learning for Surface Anomaly Classification Based on 3D Point Cloud Data

Feb 17, 2025

Xuanming Cao, Chengyu Tao, Juan Du

Figure 1 for Deep Subspace Learning for Surface Anomaly Classification Based on 3D Point Cloud Data

Figure 2 for Deep Subspace Learning for Surface Anomaly Classification Based on 3D Point Cloud Data

Figure 3 for Deep Subspace Learning for Surface Anomaly Classification Based on 3D Point Cloud Data

Figure 4 for Deep Subspace Learning for Surface Anomaly Classification Based on 3D Point Cloud Data

Abstract:Surface anomaly classification is critical for manufacturing system fault diagnosis and quality control. However, the following challenges always hinder accurate anomaly classification in practice: (i) Anomaly patterns exhibit intra-class variation and inter-class similarity, presenting challenges in the accurate classification of each sample. (ii) Despite the predefined classes, new types of anomalies can occur during production that require to be detected accurately. (iii) Anomalous data is rare in manufacturing processes, leading to limited data for model learning. To tackle the above challenges simultaneously, this paper proposes a novel deep subspace learning-based 3D anomaly classification model. Specifically, starting from a lightweight encoder to extract the latent representations, we model each class as a subspace to account for the intra-class variation, while promoting distinct subspaces of different classes to tackle the inter-class similarity. Moreover, the explicit modeling of subspaces offers the capability to detect out-of-distribution samples, i.e., new types of anomalies, and the regularization effect with much fewer learnable parameters of our proposed subspace classifier, compared to the popular Multi-Layer Perceptions (MLPs). Extensive numerical experiments demonstrate our method achieves better anomaly classification results than benchmark methods, and can effectively identify the new types of anomalies.

Via

Access Paper or Ask Questions

A Novel Representation of Periodic Pattern and Its Application to Untrained Anomaly Detection

Sep 09, 2024

Peng Ye, Chengyu Tao, Juan Du

Figure 1 for A Novel Representation of Periodic Pattern and Its Application to Untrained Anomaly Detection

Figure 2 for A Novel Representation of Periodic Pattern and Its Application to Untrained Anomaly Detection

Figure 3 for A Novel Representation of Periodic Pattern and Its Application to Untrained Anomaly Detection

Figure 4 for A Novel Representation of Periodic Pattern and Its Application to Untrained Anomaly Detection

Abstract:There are a variety of industrial products that possess periodic textures or surfaces, such as carbon fiber textiles and display panels. Traditional image-based quality inspection methods for these products require identifying the periodic patterns from normal images (without anomaly and noise) and subsequently detecting anomaly pixels with inconsistent appearances. However, it remains challenging to accurately extract the periodic pattern from a single image in the presence of unknown anomalies and measurement noise. To deal with this challenge, this paper proposes a novel self-representation of the periodic image defined on a set of continuous parameters. In this way, periodic pattern learning can be embedded into a joint optimization framework, which is named periodic-sparse decomposition, with simultaneously modeling the sparse anomalies and Gaussian noise. Finally, for the real-world industrial images that may not strictly satisfy the periodic assumption, we propose a novel pixel-level anomaly scoring strategy to enhance the performance of anomaly detection. Both simulated and real-world case studies demonstrate the effectiveness of the proposed methodology for periodic pattern learning and anomaly detection.

Via

Access Paper or Ask Questions

F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection

Jul 09, 2024

Chengyu Tao, Hao Xu, Juan Du

Figure 1 for F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection

Figure 2 for F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection

Figure 3 for F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection

Figure 4 for F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection

Abstract:Image-based inspection systems have been widely deployed in manufacturing production lines. Due to the scarcity of defective samples, unsupervised anomaly detection that only leverages normal samples during training to detect various defects is popular. Existing feature-based methods, utilizing deep features from pretrained neural networks, show their impressive performance in anomaly localization and the low demand for the sample size for training. However, the detected anomalous regions of these methods always exhibit inaccurate boundaries, which impedes the downstream tasks. This deficiency is caused: (i) The decreased resolution of high-level features compared with the original image, and (ii) The mixture of adjacent normal and anomalous pixels during feature extraction. To address them, we propose a novel unified optimization framework (F2PAD) that leverages the Feature-level information to guide the optimization process for Pixel-level Anomaly Detection in the inference stage. The proposed framework is universal and plug-and-play, which can enhance various feature-based methods with limited assumptions. Case studies are provided to demonstrate the effectiveness of our strategy, particularly when applied to three popular backbone methods: PaDiM, CFLOW-AD, and PatchCore.

Via

Access Paper or Ask Questions

3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

Apr 11, 2024

Xuanming Cao, Chengyu Tao, Juan Du

Figure 1 for 3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

Figure 2 for 3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

Figure 3 for 3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

Figure 4 for 3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

Abstract:The surface quality inspection of manufacturing parts based on 3D point cloud data has attracted increasing attention in recent years. The reason is that the 3D point cloud can capture the entire surface of manufacturing parts, unlike the previous practices that focus on some key product characteristics. However, achieving accurate 3D anomaly detection is challenging, due to the complex surfaces of manufacturing parts and the difficulty of collecting sufficient anomaly samples. To address these challenges, we propose a novel untrained anomaly detection method based on 3D point cloud data for complex manufacturing parts, which can achieve accurate anomaly detection in a single sample without training data. In the proposed framework, we transform an input sample into two sets of profiles along different directions. Based on one set of the profiles, a novel segmentation module is devised to segment the complex surface into multiple basic and simple components. In each component, another set of profiles, which have the nature of similar shapes, can be modeled as a low-rank matrix. Thus, accurate 3D anomaly detection can be achieved by using Robust Principal Component Analysis (RPCA) on these low-rank matrices. Extensive numerical experiments on different types of parts show that our method achieves promising results compared with the benchmark methods.

Via

Access Paper or Ask Questions